Skip to main content

Internet Archive Web Crawls

The Internet Archive discovers and captures web pages through many different web crawls.



rss RSS

1,815,014
RESULTS


Show sorted alphabetically

Show sorted alphabetically

SHOW DETAILS
up-solid down-solid
Prior Page
eye
Title
Date Archived
Creator
nsdlweb
collection
91
ITEMS
58.2M
VIEWS
collection

eye 58.2M

this data is currently not publicly accessible.
Live Web Proxy Crawls
web

eye 812,009

favorite 0

comment 0

Collections news crawls v2
collection
820
ITEMS
49.2M
VIEWS
by ximm@archive.org
collection

eye 49.2M

Live Web Proxy Crawls
web

eye 1.5M

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 1.6M

favorite 0

comment 0

Edu & Gov Crawl, June 2010
Edu & Gov Crawl, June 2010
collection
704
ITEMS
26.5M
VIEWS
collection

eye 26.5M

TEST COLLECTION: Crawl of .edu and .gov sites started in June 2010.
Topic: crawldata
End of Term 2012 Web Crawls
End of Term 2012 Web Crawls
collection
2,383
ITEMS
41.6M
VIEWS
collection

eye 41.6M

This collection contains web crawls performed on the US Federal Executive, Legislative & Judicial branches of government in 2012-2013.
Topics: end of term, US, Federal government, 2012, Obama
Live Web Proxy Crawls
web

eye 637,519

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 722,125

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 587,385

favorite 0

comment 0

ORG Survey Crawls
ORG Survey Crawls
collection
193
ITEMS
41.5M
VIEWS
collection

eye 41.5M

Survey of .org domains. This data is currently not publicly accessible.
Live Web Proxy Crawls
web

eye 565,648

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 487,851

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 502,165

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 423,846

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 10.7M

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 587,312

favorite 0

comment 0

International News Crawl started in September 2010
International News Crawl started in September 2010
collection
1,083
ITEMS
45.6M
VIEWS
collection

eye 45.6M

Crawl of International News Sites with initial seedlist and crawler configuration from Sep 1, 2010.
Live Web Proxy Crawls
web

eye 415,785

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 569,055

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 539,289

favorite 0

comment 0

Government Web & Data Archive
Government Web & Data Archive
collection
6,269
ITEMS
21.5M
VIEWS
collection

eye 21.5M

This collaborative project is an extension of the 2016  End of Term  project, intended to document the federal government's web presence by archiving government websites and data. As part of this preservation effort, URLs supplied from partner institutions, as well as nominated by the public, will be crawled regularly to provide an on-going view of federal agencies' web and social media presence. Key partners on this effort are the Environmental Data & Governance...
Topics: government, data, federal, congress
survey_net00001
collection
170
ITEMS
28.1M
VIEWS
collection

eye 28.1M

Survey crawl of .net domains started October 2011.
Topics: webwidecrawl, net
End Of Term 2016 UNT Crawls
End Of Term 2016 UNT Crawls
collection
1,275
ITEMS
17M
VIEWS
collection

eye 17M

End of Term 2016 Web Archive government web crawls by project partner the University of North Texas.
Topics: end of term, federal government, 2016, president, congress, university of north texas
Live Web Proxy Crawls
web

eye 4M

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 429,904

favorite 0

comment 0

Wide Crawl Number 16: Started June 3rd, 2017 - Still running
web

eye 303,187

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Sun Oct 15 11:28:54 PDT 2017 to Sun Oct 15 06:00:00 PDT 2017.
Topic: crawldata
Live Web Proxy Crawls
web

eye 419,467

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 415,254

favorite 0

comment 0

YouTube 2007 Crawl
YouTube 2007 Crawl
collection
73,847
ITEMS
24.3M
VIEWS
collection

eye 24.3M

Data crawled from YouTube.com in 2007 by Internet Archive. These files are not currently accessible.
Crawl Data
Crawl Data
collection
32,980
ITEMS
23.7M
VIEWS
collection

eye 23.7M

Crawl Data. This data is currently not publicly accessible.
crawl_UNK
crawl_UNK
collection
32,949
ITEMS
23.7M
VIEWS
collection

eye 23.7M

Crawl data. This data is currently not publicly accessible.
Live Web Proxy Crawls
web

eye 423,004

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 365,297

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 306,040

favorite 0

comment 0

Wide Crawl started September 2010
Wide Crawl started September 2010
collection
332
ITEMS
19.1M
VIEWS
collection

eye 19.1M

Web wide crawl with initial seedlist and crawler configuration from September 2010
Wide Crawl Number 18
web

eye 400,922

favorite 0

comment 0

Internet Archive crawl data from the wide crawl number 18, captured by crawl808.us.archive.org:wide18 from Thu Aug 12 22:36:46 PDT 2021 to Thu Aug 12 16:30:58 PDT 2021.
Topic: crawldata
collection

eye 21.1M

End of term 2008 crawl data gathered by Internet Archive on behalf of the California Digital Library. This data is currently not publicly accessible.
Shallow Crawl Started February 2013
Shallow Crawl Started February 2013
collection
86
ITEMS
20.3M
VIEWS
collection

eye 20.3M

Shallow crawl started February 2013 that collects content 1 level deep, including embeds. Access to content is restricted. Please visit the Wayback Machine to explore archived web sites.
Live Web Proxy Crawls
web

eye 213,151

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 987,413

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 197,869

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 226,360

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 665,882

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 22.3M

favorite 8

comment 1

Internet Archive Liveweb Capture from WaybackMachine, captured by wwwb-proxy0.us.archive.org:wbm from Sun Mar 27 22:10:09 PDT 2011 to Mon Mar 28 05:27:05 PDT 2011.
favoritefavoritefavoritefavoritefavorite ( 1 reviews )
Topic: crawldata
2004 Election
2004 Election
collection
178
ITEMS
16.4M
VIEWS
collection

eye 16.4M

2004 Election crawl performed by Internet Archive. This data is currently not publicly accessible.
World Wars Crawl
World Wars Crawl
collection
13
ITEMS
16.7M
VIEWS
collection

eye 16.7M

Web data related to World Wars I and II collected by Internet Archive in an experimental crawl sponsored by National Endowment for the Humanities and JISC. This data is currently not publicly accessible.
Live Web Proxy Crawls
web

eye 972,514

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 3.9M

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 6.1M

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 743,435

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 281,872

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 1.7M

favorite 0

comment 0

Internet Archive Liveweb Capture from WayBack Machine, captured by wwwb-liveweb.us.archive.org from 2012-06-04T16:05:30 UTC to 2012-06-05T00:00:01 UTC.
Topic: crawldata
Live Web Proxy Crawls
web

eye 336,859

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 333,304

favorite 0

comment 0

Shallow Crawl Started November 2012
Shallow Crawl Started November 2012
collection
137
ITEMS
13.1M
VIEWS
collection

eye 13.1M

Shallow crawl started November 2012 that collects content 1 level deep, including embeds. This data is currently not publicly accessible.
Live Web Proxy Crawls
web

eye 3.2M

favorite 0

comment 0

Hurricane Katrina
Hurricane Katrina
collection
112
ITEMS
12.5M
VIEWS
collection

eye 12.5M

Data related to Hurricane Katrina collected in 2005 by Internet Archive. This data is currently not publicly accessible. from Wikipedia : Hurricane Katrina was the deadliest and most destructive Atlantic hurricane of the 2005 Atlantic hurricane season. It was the costliest natural disaster, as well as one of the five deadliest hurricanes, in the history of the United States. Among recorded Atlantic hurricanes, it was the sixth strongest overall. At least 1,833 people died in the hurricane and...
Live Web Proxy Crawls
web

eye 1M

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl427.us.archive.org:wide from Sun Oct 16 02:25:38 PDT 2016 to Sat Oct 15 21:19:02 PDT 2016.
Topic: crawldata
Live Web Proxy Crawls
web

eye 4.4M

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 13.4M

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 246,830

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 349,849

favorite 0

comment 0

Wide Crawl Number 17: Started August 3rd, 2018
web

eye 7.6M

favorite 3

comment 8

Internet Archive crawldata from Webwide Crawl, captured by crawl805.us.archive.org:wide from Mon May 13 17:55:38 PDT 2019 to Mon May 13 13:45:55 PDT 2019.
favoritefavoritefavoritefavoritefavorite ( 8 reviews )
Topic: crawldata
Live Web Proxy Crawls
web

eye 1.2M

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 998,452

favorite 0

comment 0

End of Term 2008 UNT Dot Gov Crawl
End of Term 2008 UNT Dot Gov Crawl
collection
282
ITEMS
12.6M
VIEWS
collection

eye 12.6M

End of term 2008 crawl of .gov domains gathered by University of North Texas . This data is currently not publicly accessible. UNT is a student-focused, public, research university located in Denton, Texas.
Live Web Proxy Crawls
web

eye 691,720

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 234,963

favorite 0

comment 0

Shallow Crawl Started 2013
web

eye 453,703

favorite 0

comment 0

Internet Archive crawldata from Shallow Webwide Crawl, captured by crawl455.us.archive.org:shallow from Sun Jun 9 02:57:23 PDT 2013 to Sat Jun 8 21:30:29 PDT 2013.
Topic: crawldata
Wide Crawl started February 2014
web

eye 9.9M

favorite 0

comment 0

Internet Archive crawldata from Webwide Crawl, captured by crawl453.us.archive.org:wide from Wed Feb 19 01:09:37 PST 2014 to Tue Feb 18 21:33:27 PST 2014.
Topic: crawldata
Wide Crawl started February 2014
web

eye 9.7M

favorite 0

comment 1

Internet Archive crawldata from Webwide Crawl, captured by crawl426.us.archive.org:wide from Wed Feb 19 07:58:38 PST 2014 to Wed Feb 19 05:13:46 PST 2014.
favoritefavoritefavoritefavoritefavorite ( 1 reviews )
Topic: crawldata
Live Web Proxy Crawls
web

eye 900,064

favorite 0

comment 0

Live Web Proxy Crawls
web

eye 1.4M

favorite 0

comment 0